On optimization, parallelization and convergence of the Expectation-Maximization algorithm for finite mixtures of Bernoulli distributions

نویسنده

  • Dumitru Erhan
چکیده

This paper reviews the Maximum Likelihood estimation problem and its solution via the Expectation-Maximization algorithm. Emphasis is made on the description of finite mixtures of multi-variate Bernoulli distributions for modeling 0-1 data. General ideas about convergence and non-identifiability are presented. We discuss improvements to the algorithm and describe thoroughly what we believe are novel ideas in the treatment of the topic: 1) identification of unique data points and recycling of that information 2) parallelization of the algorithm in a multi-threaded fashion 3) cluster assignment options. Experiments demonstrate that most of our approaches produce good results and encourage further research on the topic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures

While the Expectation-Maximization (EM) algorithm is a popular and convenient tool for mixture analysis, it only produces solutions that are locally optimal, and thus may not achieve the globally optimal solution. This paper introduces a new algorithm, based on the global optimization algorithm Model Reference Adaptive Search (MRAS), designed to produce globally-optimal solutions in the estimat...

متن کامل

Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions

The class of finite mixtures of multivariate Bernoulli distributions is known to be nonidentifiable; that is, different values of the mixture parameters can correspond to exactly the same probability distribution. In principle, this would mean that sample estimates using this model would give rise to different interpretations. We give empirical support to the fact that estimation of this class ...

متن کامل

Bayesian Mixtures of Bernoulli Distributions

The mixture of Bernoulli distributions [6] is a technique that is frequently used for the modeling of binary random vectors. They differ from (restricted) Boltzmann Machines in that they do not model the marginal distribution over the binary data space X as a product of (conditional) Bernoulli distributions, but as a weighted sum of Bernoulli distributions. Despite the non-identifiability of th...

متن کامل

Multivariate Structural Bernoulli Mixtures for Recognition of Handwritten Numerals

As shown recently, the structural optimization of probabilistic neural networks can be included into EM algorithm by introducing a special type of multivariate Bernoulli mixtures. However, the underlying loglikelihood criterion is known to be multimodal in case of mixtures and therefore the EM iteration process may be starting-point dependent. In the present paper we discuss the possibility of ...

متن کامل

Mixture Modeling of DNA Copy Number Amplification Patterns in Cancer

DNA copy number amplifications are hallmarks of many cancers. In this work we analyzed data of genome-wide DNA copy number amplifications collected from more than 4500 neoplasm cases. Based on the 0-1 representation of the data, we trained finite mixtures of multivariate Bernoulli distributions using the EM algorithm to describe the inherent structure in the data. The resulting component distri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003